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We present a framework that takes a concurrent program composed of unsynchronized processes, 
along with a temporal specification of their global concurrent behaviour, and automatically generates 
a concurrent program with synchronization ensuring correct global behaviour. Our methodology 
supports finite-state concurrent programs composed of processes that may have local and shared 
variables, may be straight-line or branching programs, may be ongoing or terminating, and may 
have program-initialized or user-initialized variables. The specification language is an extension of 
propositional Computation Tree Logic (CTL) that enables easy specification of safety and liveness 
properties over control and data variables. The framework also supports synthesis of synchronization 
at different levels of abstraction and granularity. 

1 Introduction 

Shared-memory concurrent programs are ubiquitous in today's era of multi-core processors. Unfortu- 
nately, these programs are hard to write and even harder to verify. We assert that one can simplify the 
design and analysis of (shared-memory) concurrent programs by, first, manually writing synchronization- 
free concurrent programs, followed by, automatically synthesizing the synchronization code necessary 
for ensuring the programs' correct concurrent behaviour. This particular approach to synthesis of con- 
current programs was first developed in Q O and was revisited more recently in |[T3l l24l |25]| . The 
early synthesis papers focused on propositional temporal logic specifications and restricted models of 
concurrent programs such as synchronization skeletons. Even when dealing with finite-state programs, 
it is highly cumbersome to express properties over functions and predicates of program variables using 
propositional temporal logic. Besides, synchronization skeletons that suppress data variables and compu- 
tations are often inadequate abstractions of real-world concurrent programs. The more recent synthesis 
approaches have fairly sophisticated program models. However, they are applicable for restricted classes 
of specifications such as safety properties, and entail some possibly restrictive assumptions. For instance, 
it is almost always assumed that all data variables are initialized within the program to specific values, 
thereby disallowing any kind of user or environment input to a concurrent program. The presence of 
local data variables is also rarely accounted for or treated explicitly. Finally, there has been limited effort 
in developing adaptable synthesis frameworks that are capable of generating synchronization at different 
levels of abstraction and granularity. 

In this paper, we present a comprehensive treatment of synthesis of synchronization for concurrent 
programs with CTL-like specifications over program variables. We support finite-state concurrent pro- 
grams composed of processes that may have local and shared variables, may be straight-line or branching, 
may be ongoing or terminating, and may be executed as a closed system (with no external environment) 
or with an external environment that may initialize the values of the program variables or read the values 
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of the program variables at any point in the programs' execution. We propose an extension to proposi- 
tional CTL that helps express properties over program locations and data variables. These properties may 
be syntactic, e.g., AG-i(/oci = Zi A I0C2 = h), specifying that the first and the second process cannot 
simultaneously be in locations Zi and I2, respectively, or semantic, e.g., AG (vi = u =^ AF(v2 = u + 1)), 
specifying that if the value of variable vi is u, then it is inevitable that the value of variable V2 be u + 1, or 
both syntactic and semantic. Furthermore, as is evident from the above examples, these properties may 
express safety as well as liveness requirements. Finally, we support the synthesis of synchronization 
in the form of conditional critical regions (CCRs), or based on lower-level synchronization primitives 
such as locks and condition variables. In the latter case, the synthesized synchronization can be either 
coarse-grained or fine-grained. 

Given a concurrent program P composed of synchronization-free processes, Pi,P2, . . . ,Pk, and a 
temporal logic specification (p^pec specifying the expected concurrent behaviour, the goal is to obtain 
synchronized processes, . . . such that the concurrent program f resulting from their asyn- 

chronous composition satisfies ^spec- This is effected in several steps in our proposed approach. The first 
step involves specifying the concurrency and operational semantics of the unsynchronized processes as 
a temporal logic formula ^p. We help mitigate the user's burden of specification-writing by automati- 
cally generating (pp. The second step involves construction of a tableau T^, for given by <pp A (pspec- 
If the overall specification is found to be satisfiable, the tableau yields a global model M, based on 
Pi,P2, ■ ■ . ,Pk such that M \= (p. The next step entails decompositon of M into the desired synchronized 
processes P^,...,P^ with synchronization in the form of CCRs. The last step comprises a mechanical 
compilation of the synthesized CCRs into both coarse-grained and fine-grained synchronization code 
based on locks and condition variables. 

To construct the tableau Ttj, , we adapt the tableau-construction for propositional CTL to our extended 
specification language over variables, functions and predicates. When there exist environment-initialized 
variables, we present an initial brute-force solution for modifying the basic approach to ensure that P^ 
satisfies ^^pec for all possible initial values of such variables. Also, we address the effect of local variables 
on the permitted behaviours in P'^ due to limited observability of global states, and discuss solutions. 

The paper is structured as follows. We begin by introducing our specification language and program 
model in Sec. [2] We present a basic algorithmic framework in Sec. [3} focussing on the formulation of (pp, 
tableau construction, model generation and extraction of CCRs. We then address extensions of the basic 
framework to deal with uninitialized variables, local variables, different synchronization primitives and 
multiple processes in Sec.|4] We conclude with a discussion of related and future work in Sec.|5] 

2 Formal Framework 
2.1 A vocabulary L 

Symbols o/L: We fix a vocabulary L that includes a set of variable symbols (denoted v, vi etc.), a 
set L'^ of function symbols (denoted /, f\ etc.), a set L'^ of predicate symbols (denoted B, Bi etc.), and 
a non-empty set of sorts. contains the special sort bool, along with the special sort location. 
Each variable v has associated with it a sort in L^, denoted sort{v). Each function symbol / has an 
associated arity and a sort: sort{f) for an m-ary function symbol is an m + 1 -tuple < ai , . . . , a,„, a > 
of sorts in L^, specifying the sorts of both the domain and range of /. Each predicate symbol B also 
has an associated arity and sort: sort{B) for an m-ary predicate symbol is an m-tuple < ai , . . . , a„, > 
of sorts in L^. Constant symbols (denoted c, c\ etc.) are identified as the 0-ary function symbols, with 
each constant symbol c associated with a sort, denoted sort{c), in L^. The vocabulary L also explicitly 
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includes the distinguished equality predicate symbol =, used for comparing elements of the same sort. 

Syntax ofh-terms and h-atoms: Given any set of variables V C L"*^, we inductively construct the set of 
L-terms and L-atoms over V, using sorted symbols, as follows: 

• Every variable of sort cr is a term of sort <7. 

• If / is a function symbol of sort < ai am , cr > , and tj is a term of sort Oj for j G [ 1 , m] , then 
f{ti ,...,tm)isa. term of sort o. In particular, every constant of sort C7 is a term of sort (7. 

• If B is a predicate symbol of sort < Oi,...,Om>, and tj is a term of sort Oj for j e [1 ,m], then 
B{ti,... ,tm) is m atom. 

• If ?i, ?2 are terms of the same sort, ti = ?2 is an atom. 

Semantics ofL-terms and li-atoms: Given any set of variables V C L^, an interpretation I of symbols of 
L, and L-terms and L-atoms over y is a map satisfying the following: 

• Every sort C7 G L^ is mapped to a nonempty domain Z)<y. In particular, the sort bool is mapped 
to the Boolean domain D^°°^ : {T,F}, and the sort location is mapped to a domain of control 
locations in a program. 

• Every variable symbol v of sort (7 is mapped to an element in Da. 

• Every function symbol /, of sort < ai , . . . , a^, CJ > is mapped to a function : D(j, x . . .£)a„ 
Da. In particular, every constant symbol c of sort a is mapped to an element € Da- 

• Every predicate symbol B of sort < ai . . . > is mapped to a function Da^ x . . -Da^ — ?■ D^°°^. 

Given an interpretation / as defined above, the valuation val'\t] of an L-term t and the valuation 

val'[G] of an L-atom G are defined as follows: 

• For a term t which is a variable v, the valuation is . 

• For a term/(fi, . . .,?,„)» the valuation va/^[/(?i, . . = f{val'[t\\,. . .,val'[tm\)- 

• For an atom G{ti the valuation vaf[G{ti tm)] = T iff G\val'[ti], vaf[tm]) = T. 

• For an atom ti = t2, val'[ti = ^2] = T iff vaZ'[?i] = val'[t2]. 

In the rest of the paper, we assume that the interpretation of constant, function and predicate symbols 
in L is known and fixed. We further assume that the interpretation of sort symbols to specific domains is 
known and fixed. With some abuse of notation, we shall denote the interpretation of all constant, function 
and predicate symbols simply by the symbol name, and identify sorts with their domains. Examples 
of some constant, function and predicate symbols that may be included in L are: constant symbols 
0, 1,2, function symbols +, — , and predicate symbols <, > over the integers, function symbols V, over 
bool, the constant symbol (p (empty list), function symbol • (appending lists) and predicate symbol null 
(emptiness test) over lists, etc.. Finally, when the interpretation is obvious from the context, we denote 
the valuations val'[t], val'[G] of terms t and atoms G simply as val[t], val[G], respectively. 
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2.2 Concurrent Programs 

In our framework, we consider a (shared-memory) concurrent program to be an asynchronous composi- 
tion of a non-empty, finite set of processes, equipped with a finite set of program variables that range over 
finite domains. We assume a simple concurrent programming language with assignment, condition test, 
unconditional goto, sequential and parallel composition, and the synchronization primitive - conditional 
critical region (CCR) |[T2l[T0l . A concurrent program P is written using the concurrent programming lan- 
guage, in conjunction with L-terms and L-atoms. We assume that the sets of (data and control) variables, 
functions and predicates available for writing P are each finite subsets of L"*^, L'^ and L'^, respectively. 

A concurrent program is given as P :: [declaration] [Pi\\ ... ||/\], with ^ > 0. The declaration 
consists of a finite sequence of declaration statements, specifying the set of shared data variables X, 
their domains, and possibly initializing them to specific values. For example, the declaration state- 
ment, vi,V2 : {0, 1,2,3} with vi = 0, declares two variables vi, V2, each with (a finite integer) domain 
{0, 1,2,3}, and initializes the variable vi to the value 0. The initial value of any uninitialized variable is 
assumed to be a user/environment input from the domain of the variable. 

A process Pi consists of a declaration of local data variables F,- (similar to the declaration of shared 
data variables in P), and a finite sequence of labeled, atomic instructions, / : inst. We denote the unique 
instruction at location / as inst{l). The set of data variables Vart accessible by Pi is given by XVJYi. 
The set of labels or locations of Pt is denoted L, = {/?, . . . , /"'}, with Z? being a designated start location. 
Unless specified otherwis^ an atomic instruction inst is an assignment, condition test, unconditional 
goto, or CCR. An assignment instruction A, given by (v;, , . . . , v,^) := (fi , . . . , is a parallel assignment 
of L-terms ti,...,tq, over Vart, to the data variables v,j , . . . , v,^ in Vart. Upon completion, an assignment 
statement at transfers control to the next location A condition test, if (G) lif, leise, consists of 
an L-atom G over Vart, and a pair of locations h/Jehe in L, to transfer control to if G evaluates to T, 
F, respectively. The instruction goto / is a transfer of control to location / E L,-. A CCR is a guarded 
insruction block, G — )■ instJblock, where the enabling guard G is an L-atom over Vart and inst Mock is 
a sequence of assignment, conditional and goto statements. The guard G is evaluated atomically and 
if found to be T, the corresponding instJblock is executed atomically, and control is transferred to the 
next location. If G is found to be F, the process waits at the same location till G evaluates to T. An 
unsynchronized process does not contain CCRs. 

We model the asynchronous composition of concurrent processes by the nondeterministic interleav- 
ing of their atomic instructions. Hence, at each step of the computation, some process, with an en- 
abled transition, is nondeterministically selected to be executed next by a scheduler. The set of program 
variables is denoted V = LocUVar, where hoc = {locy, . . . Joc^] is the set of control variables and 
Var = Var\ U . . . U Varj^ is the set of data variables. The semantics of the concurrent program P is given 
by a transition system (5, S^,R), where 5 is a set of states, ^ 5 is a set of initial states and RCSxSis 
the transition relation. Each state 5 G 5 is a valuation of the program variables in V. We denote the value 
of variable v in state s as val" [v] , and the corresponding value of a term t and an atom G in state s as vc?/* [t] 



and vaF[G], respectively. vaZ*[f] and vaF[G] are defined inductively as in Sec. 2.1 The domain of each 
control variable Zoc, G V is the set of locations L, , and the domain of each data variable is determined from 
its declaration. The set of initial states corresponds to all states s with vaF[loci\ = if for all / € [l,k], 
and va/^ [v] = Vmit, for every data variable v initialized in its declaration to some constant Vmit- There 
exists a transition from state s to s' in R, with vaF[loci] = U, vaF [loci] = l[ and vaV [locj] = va^llocj] 
for all j / /, iff there exists a corresponding local move in process Pi involving instruction inst (It), such 



'a user may define an atomic instruction (block) as a sequence of assignment, conditional and goto statements 
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that: 

1. inst{li) is the assignment instruction: (v,, ,v;^) := (fi, ... for each variable v,-. with jG [1,^]: 
vaZ''[v,J = val'^[tj], for all other data variables v: va/^'[v] = va/*[v], and /• is the next location in Pi 
after or, 

2. inst{li) is the condition test: if (G) Z,/, leUe, the valuation of all data variables in s' is the same as 
that in s, and either vaZ*[G] is T and /■ = or vaZ'*[G] is F and /■ = Igi^e, or, 

3. inst{li) is goto I, the valuation of all data variables in s' is the same as that in s, and /• = I, or, 

4. inst{li) is the CCR G — )• instJblock, vaF[G\ is T, the valuation of all data variables in s' correspond 
to the atomic execution of instJblock from state s, and /• is the next location in Pi after 

We assume that R is total. For terminating processes P,, we assume that Pi ends with a special instruction, 
halt : goto halt. 

2.3 Specifications 

Our specification language, LCTL, is an extension of propositonal CTL, with formulas composed from 
L-atoms. While one can use propositional CTL for specifying properties of finite-state programs, LCTL 
enables more natural specification of properties of concurrent programs communicating via typed shared 
variables. We describe the syntax and semantics of this language below. 

Syntax: Given a set of variables V C L^, we inductively construct the set of (hCTL) formulas over V, 
using L-atoms, in conjunction with the propositional operators V and the temporal operators A, E, X, U, 
along with the process-indexed next-time operator X, : 

• Every L-atom over V is a formula. 

• If 01, <p2 are formulas, then so are and 0i V 02- 

• If <pi, ^2 are formulas, then so are EX0i, EX; ^i, A[^\ U ^2] and E[(^i U ^2]- 

We use the following standard abbreviations: 0i A 02 for -■(-■^i V -102), 0i 02 for -i0i V 02, 0i ^ 02 
for (01 02) A (02 ^ 0i), AX0 for ^EX^0, AX,-0 for -EX;^0, AF0 for A[TU0], EF0 for E[TU0], 
EG0 for ^AF^0, and AG0 for^EF^0. 

Semantics: LCTL formulas over a set of variables V are interpreted over models of the form M = 
{S,R,L), where 5 is a set of states and /? is a a total, multi-process, binary relation R = U,7?, over S, 
composed of the transitions /?, of each process Pi. L is a labeling function that assigns to each state s £ S 
a valuation of all variables in V. The value of a term ? in a state 5' G 5 of M is denoted as va/^^'^' [f]. 



and is defined inductively as in Sec. 2.1 A path in M is a sequence K = {so,si,. . .) of states such that 
{sj,Sj^i) e R, for all j > 0. We denote the state in n as nj. 

The satisfiability of a LCTL formula in a state 5 of M can be defined as follows: 

• 1= G(fi, . . . ,t,„) iff G(va/('^-^)[fi], . . . , va/(^'-^) [t,„]) = T. 

• M,s 1= ti = t2 iff va/(^''')[?i] = va/(^'')[?2]. 

• M,s\= -10 iff it is not the case that M,s |= 0. 

• M, 5 1= 01 V 02 iff M, s 1= 01 or M, s ^ 02. 

• M,s\= EX iff for some ^'i such that {s,si) £R,M,s\ |= 0. 
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• M,s\= EXi^ iff for some such that {s,s\) G M,si |= 0. 

• M,s\=A [^i U ^2] iff for all paths n starting at s, 3j[M, 71 j \= (pj and Vfc {k < j ^ M, tt^- |= 0i )] . 

• M,s \= E[^iU ^2] iff there exists a path n starting at s such that 37 [M, Tij |= <p2 and 'ik{k < 7 — )• 
M,7ri |=0i)]. 

Programs as Models: A program P = (5,5°,/?) can be viewed as a model M = {S,R,L), with the same 
set of states and transitions as P, and the identity labeling function L that maps a state to itself. Given an 
hCTL specification (j), we say P ^ iff for each state s & 1^, M,s \= (j). 



3 Basic Algorithmic Framework 

In this section, for ease of exposition, we assume a simpler program model than the one described in 



Sec. 2.2 We restrict the number of concurrent processes k to 2. We assume that all data variables are 
initialized in the program to specific values from their respective domains. We further assume that all 
program variables, including control variables, are shared variables. We explain our basic algorithmic 
framework with these assumptions, and later describe extensions to handle the general program model 
in Sec. a 

Let us first review our problem definition. Given a concurrent program P, composed of unsynchro- 
nized processes Pi, P2, and an LCTL specification (pspec of their desired global concurrent behaviour, 
we wish to automatically generate synchronized processes Pj , PI, such that the resulting concurrent pro- 
gram P* \= (pspec- If Pi, P2 consist of atomic instructions, we wish to obtain synchronization in the form 
of CCRs, with each instruction enclosed in a CCR. In particular, the goal is to synthesize the guard for 
each CCR, along with any necessary (synchronization) assignments to be performed within the CCR. 

We propose an automated framework to do this in several steps. 

1 . Formulate an LCTL formula 0p to specify the semantics of the concurrent program P. 

2. Construct a tableau for the formula given by 0p A (pspec- If is empty, declare specification 
as inconsistent and halt. 

3. If is non-empty, extract a model M for from it. 

4. Decompose M to obtain CCRs to synchronize each process. 
In what follows, we describe these steps in more detail. 



3.1 Formulation of 

A reader familiar with the early synthesis work in [7 | will recall that the synthesis of a global model 
requires a complete specification, which includes a temporal description (pp of the concurrency and oper- 
ational semantics of the unsynchronized concurrent program P, along with its desired global behaviour 
<pspec- We propose to automatically infer an LCTL formula for <pp to help mitigate the user's burden of 
specification- writing. Let Var = {vi, . . . ,v/,} be the set of data variables. {<pp is then formulated as the 
conjunction of the following (classes of) properties: 

1. Initial condition: 

Val[loC\] =1^ A Val[loC2] =^2 ^ AveVar Val[v] = Vjnit- 
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2. At any step, only one process can make a (local) move: 

AG A'jZ"' {{val[loci] = l{) AX2 {val[loci] = //)) A 

AG Af ? {{val[loc2\ = l{) AXi {val[loc2] = 

3. Some process can always make a (local) move: 
AG(EXiT V EX2T). 

4. A statement Z[ : {v,i, • • • ,v,^} :={fi, . . . ,tq\ in Pj is formulated as: 

AG (((va/[Zoc,] = l\) A Ayl? va/[v^] = Vj) 

AX,- ((vaZ[/oc,] =/[+!) A ^^rf val[vij] = val[tj] A Av,ey«r\{v,„...,v„} ^/[vy] = u^-))- 

5. A statement : if (G) /,/, Zg/ie in Pj is formulated as: 
AG{{{val[loCi] = li) A (va/[G] = T)) ^ AX,- (va/ [/oc,-] = /,•/)) A 
AG(((va/[/oc,] = Z,) A {val[G] = F)) ^ AX; (va/ [/oc,-] = W)). 

6. A statement : goto I in Pi is formulated as: 

kQ{{val[loci] = U) /KXi{val[loCi] = I)) 

3.2 Construction of 

We assume the ability to evaluate L-atoms and L-terms over the set V of program variables. Note that 
since we restrict ourselves to a finite subset of the symbols in L, this is a reasonable assumption. Let us 
further assume that the formula (j) = (ppA ^spec is in a form in which only atoms appear negated. 

An elementary formula of LCTL is an atom, negation of an atom or the formulas beginning with 
AX, or EX, (we do not explicitly consider formulas beginning with AX or EX since AX 1// = A, AX, 
and EX 1//^ = V, EX, i//^. All other formulas are nonelementary. Every nonelementary formula is either a 
conjunctive formula a = ai A a2 or a disjunctive formula j8 = jSi V jSi. For example, A i//2, AG (i/a) = 
i/A A AXAG i/A are a formulas, and i/^i V ^2, AF (i/a) = i/a V AXAF are j3 formulas. 

The tableau for the formula is a finite, rooted, directed AND/OR graph with nodes labeled with 
formulas such that when a node B is viewed as a state in a suitable structure, B |= i//^ for all formulas 
y/ The construction for is similar to the tableau-construction for propositional CTL in Q , while 
accounting for the presence of L-atoms over V in the nodes of T^. Besides composite L-atoms and 
LCTL formulas, each node of is labeled with simple atoms of the type loc = I and v = V identifying 
the values of the control and data variables in each node. Two OR-nodes Bi and B2 are identified as being 
equivalent if Bi, B2 are labeled with the same simple atoms, and the conjunction of all the formulas in Si 
is valid iff the conjunction of all the formulas in B2 is valid. Equivalence of AND-nodes can be similarly 
defined. We briefly summarize the tableau construction first, before explaining the individual steps in 
more detail. 

1. Initially, let the root node of Ttj, be an OR-node labeled with ^. 

2. If all nodes in have successors, go to the next step. Otherwise, pick a node B without successors. 
Create appropriately labeled successors of B such that: if B is an OR-node, the formulas in B are 
valid iff the formulas in some (AND-) successor node are valid, and if B is an AND-node, the 
formulas in B are valid iff the formulas in all (OR-) successor nodes are valid. Merge all equivalent 
AND-nodes and equivalent OR-nodes. Repeat this step. 

3. Delete all inconsistent nodes in the tableau from the previous step to obtain the final T^. 
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Successors of OR-nodes: To construct the set of AND-node successors of an OR-node B, first build a 
temporary tree with labeled nodes rooted at B, repeating the following step until all leaf nodes are only 
labeled with elementary formulas. For any leaf node C labeled with a non-elementary formula y/: if i//^ is 
an a formula, add a single child node, labeled C \ { i//^} U { ai , aa} > to C, and if i// is a j8 formula, add two 
child nodes, labeled C\{vA}U{j8i} and C\ {i//^} U {j82}, to C. Once the temporary tree is built, create 
an AND-node successor D for B, corresponding to each leaf node in the tree, labeled with the set of all 
formulas appearing in the path to the leaf node from the root of the tree. If there exists an atom of the 
form V = f in D, where t is an L-term, and the valuation of f in Z) is u, replace the atom v = f by the 
simple atom V = V. 

Successors of AND -nodes: To construct the set of OR-node successors of an AND-node B, create an 
OR-node labeled with { y/} for each EXi y formula in B and label the transition to the OR-node with /. 
Furthermore, label each such OR-node D (with an /-labeled transition into D) with [jj for each AXi 
formula in B. If there exists an atom of the form v = t inD, where t is an L-term, and the valuation of t 
in D is u, replace the atom v = tby the simple atom v = v. Note that the requirement that some process 
can always move ensures that there will be some successor for every AND-node. 

Deletion rules: All nodes in the tableau that do not meet all criteria for a tableau for are identified as 
inconsistent and deleted as follows: 

1. Delete any node B which is internally inconsistent, i.e., the conjunction of all non-temporal ele- 
mentary formulas in B evaluates to F. 

2. Delete any node all of whose original successors have been deleted. 

3. Delete any node B such that EIyiUYi] S B, and there does not exist some path to an AND-node 
D from B with Yi G D, and V^i € C for all AND-nodes C in the path. 

4. Delete any node B such that A[i//^i U Yi] S B, and there does not exist a full sub-DAG|^such that for 
all its frontier nodes D ,^2 ^ D and for all its non-frontier nodes C, v^i G C. 

If the root node of the tableau is deleted, we halt and declare the specification <p as inconsistent 
(unsatisfiable). If not, we proceed to the next step. 

3.3 Obtaining a model M from T^p 

A model M is obtained by joining together model fragments rooted at AND-nodes of : each model 
fragment is a rooted DAG of AND-nodes embeddable in such that all eventuality formulas labeling 
the root node are fulfilled in the fragment. We do not explain this step in more detail, as it is identical to 
the procedure in 1 7 1 After extracting M from , we modify the labels of the states of M by eliminating 
all labels other than simple atoms, identifying the values of the program variables in each state of M. If 
there exist n states s\,...,Sn with the exact same labels after this step, we introduce an auxiliary variable 

full sub-DAG T' is a directed acyclic sub-graph of a tableau T, rooted at a node of T such that all OR-nodes in T' have 
exactly one (AND-node) successor from T in T' , and all AND-nodes in T' either have no successors in T' , or, have all their 
(OR-node) successors from T in T'. 

^There may be multiple models embedded in T^. In |7 |, in order to construct model fragments, whenever there are multiple 
sub-DAGs rooted at an OR-node B that fulfill the eventualities labeling B, one of minimal size is chosen, where size of a sub- 
DAG is defined as the length of its longest path. There are other valid criteria for choosing models, exploring which is beyond 
the scope of this paper. 
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X with domain {0, 1,2, . . . ,«} to distinguish between the states: x is assumed to be in all states other 
than i'l, . . . for each j G {1, . . . we set x to j in transitions into sj, and set x back to in transitions 
out of Sj. This completes the model generation. M is guaranteed to satisfy (p by construction. 



3.4 Decomposition of M into Pf and 

Recall that Pi and P2 are unsynchronized processes with atomic instructions such as assignments, condi- 
tion tests and gotos, and no CCRs. In this last step of our basic algorithmic framework, we generate P[ 
and /*! consisting of CCRs, enclosing each atomic instruction of Pi and P2. 

Without loss of generality, consider location /i in Pi. The guard for the CCR for inst{li) in P^ 
corresponds to all states in M in which inst{l\) is enabled, i.e., states in which Pi is at location li and 
from which there exists a Pi transition. To be precise, inst{li) is enabled in state 5 in M iff there exists 
a transition {s,s') G R such that vaF[loci\ = h, vaV [I0C2] = I'l with J[ being a valid next location for 
Pi, and, val''[loc2] = vaV [I0C2]. The guard Gg corresponding to such a state s is the valuation of all 
program variables other than loci in state s. Thus, if var[loc2] = h and for all vj G Var = {vi, . . . ,Vh], 
vaV[vj\ = Vj, then G., is given by {I0C2 = h) f\ IX^^l = ^j- 

If M does not contain an auxiliary variable, then the CCR for inst{li) in P[ is simply Gij — )• inst{li), 
where Gi j is the disjunction of guards G^ corresponding to all states 5 in M in which inst{li) is enabled. 
However, if M contains an auxiliary variable x (with domain {0, 1,2, . . . ,«}), then one may also need to 
perform updates to x within the CCR instruction block. In particular, if inst{li) is enabled on state s in 
M, transition {s,s') in M is a Pi transition, and if there is an assignment x:=j for some 7 G {0, . . . ,n} 
along transition {s,s'), then besides inst{li), the instruction block of the CCR for inst{li) in Pj includes 
instructions in our programming language corresponding to: if Gs x := j. 

The synchronized process P^ (and similarly P|) can be generated by inserting a similarly gen- 
erated CCR at each location in Pi (and P2). The modified concurrent program P, is given by P, :: 
[declaration] [Pj HPj], where the declaration includes auxiliary variable x with domain {0, 1,2, . . . ,«} 
if M contains x with domain {0, 1 , 2, . . . , n}. 



3.5 Correctness and Complexity 

The following theorems assert the correctness of our basic algorithmic framework for synthesizing syn- 
chronization for unsynchronized processes Pi, P2, as defined in Sec. 2.2 with the restriction that all 
program variables are shared variables that are initialized to specific values. 

Theorem 1 Given unsynchronized processes Pi, P2 and an l,CTL formula (pspea if our basic algorithm 
generates P", then P* \= ^spec- 
Theorem 2 Given unsynchronized processes Pi, P2, and an L,CTL formula ^speo if the temporal speci- 
fication = (/>.vpec A (/>p is consistent as a whole, then our method constructs P" such that P" |= (l)spec- 

The complexity of our method is exponential in the size of <p, i.e., exponential in the size of ^spec and 
the number of program variables V. 



4 Extensions 

In this section, we demonstrate the adaptabiUty of our basic algorithmic framework by considering more 
general program models. In particular, we discuss extensions for synthesizing correct synchronization in 



26 



Algorithmic Synthesis of Synchronization 



the presence of uninitialized variables and local variables. Furthermore, we extend our framework to pro- 
gramming languages with locks and wait/signal over condition variables by presenting an automatic 
compilation of CCRs into synchronization code based on these lower-level synchronization primitives. 
We conclude with an extension of the framework to multiple processes. 



4.1 Uninitialized Variables 

In Sec. |3] we assumed that all data variables are initialized to specific values over their domains. This 
assumption may not be satisfied in general as it disallows any kind of user or environment input to a 
concurrent program. In the program model presented in Sec. [2j only some (or even none) of the data 
variables may be initialized to specific values within the program. This is a more realistic setting, which 
allows a user or environment to choose the initial values of the remaining data variables. In this subsec- 
tion, we present a simple, brute-force extension of our basic algorithm for synthesizing synchronization 
in the presence of uninitialized variables. 

The formula (pp, expressing the concurrency and operational semantics of P, remains the same, except 
for the initial condition. Instead of a single initial state, the initial condition in <pp specifies the set of all 
possible initial states, with the control and initialized data variables set to their initial values, and the 
remaining data variables ranging over all possible values in their respective domains. Let us denote by 
Vartnp this remaining set of data variables, that are, essentially, inputs to the program P. The set of 
program-initialized data variables is then Var\Varinp. The initial condition in is expressed as: 



A; Val[l0Ci] = If A AyeVar\Var,„, = ^init) A AveVflr,,,,, Vu, 



eD, 



(v = Vi 



where D,. is the domain of v. 

The root node of the tableau is now an AND-node with multiple OR-node successors, each cor- 
responding to a particular valuation v of all the data variables (the values of the control variable and 
initialized data variables are the same in any such valuation). Each such OR-node yields a model My for 
the formula (j) , and a corresponding decomposition of My into synchronized processes P'^^ and ■ 

To generate synchronized processes Py and P2 such that for all possible initial valuations v of the 
data variables, |= ipspec, we propose to unify the CCRs corresponding to each valuation v as follows: 

1. Introduce a new variable vO for every input data variable v in Vartnp. Declare vO as a variable with 
the same domain as v. Assign vO the input value of v. 

2. Replace every CCR guard G in P-^ with the guard Gy given by AveVor,„p(^0 = '^^v) A G, where the 
valuation of v in u is u,,. Similarly, update every conditional guard accompanying an auxiliary 
variable assignment within a CCR instruction block in P- . 

3. The unified guard for each CCR in Pj and Pj is given by the disjunction of the corresponding 
guards Gy in all Pj^ and Pl^ . The unified conditional guards for auxiliary variable updates in the 
CCR instruction blocks are computed similarly. 



Note that the unified guards inferred above, as well as in Sec. 3.4 may not in general be pleasant. 
However, since each guard is expected to an L-term over a finite set of variable, function and predicate 
symbols with known interpretations, it is possible to obtain a simplified L-term with the same value as 
the guard. This translation is beyond the scope of this paper, but we refer the reader to 1.14] for a similar 
approach. 
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4.2 Local Variables 

Another assumption in Sec. [3] was that all program variables, including control variables, were shared 
variables. Since one typically associates a cost with each shared variable access, it is impractical to 
expect all program variables to be shared variables. This is especially true of control variables, which 



are generally never declared explicitly or accessed in programs. Thus, the guards inferred in Sec. 3.4 
ranging over locations of the other process, are somewhat irregular. Indeed, any guard for a process Pi 
must only be defined over the data variables Vari accessible by Pi. In what follows, we discuss various 
solutions to address this issue. 

Let us assume that we have a model M = {S,R,L) for (p, with states labeled by the valuations of the 
control variables Loc, the shared data variables X, the local data variables Y = \JiYi, and possibly an 
auxiliary variable x. For the purpose of this subsection, let x be included in the set X. We first check 
if the set of states 5 of M has the property that for any two states ^i, ^2 in S: [AioceLoc i^^^] — 
vaV^ [loc] A Avey M = ^'^'^^ \y\\ ^ l\xex ^'^^^^ H = W- If this is true, then each state s e S is 
uniquely identified by its valuation of the shared data variables X. We can then simply factor out guards 
from M for each process that only range over X, without missing out on any permitted behaviour in M. If 
this is not true, we can perform other similar checks. For instance, we can check if for a particular /: any 
two states in S match in their valuations of the variables {/oc,} UYiUX iff they match in their valuations 
of the other program variables. If this is true, then the process Pi can distinguish between states in S by 
the valuations of its variables Var, U {/oc,}. Thus, we can infer guards for Pi, that are equivalent to the 



guards inferred in Sec. 3.4 but only range over Vari. 

In general, however, there will be states si, S2 in S which cannot be distinguished by the valuations 
of a particular process's, or worse, by any process's variables. This general situation presents us with a 
trade-off between synchronization cost and concurrency: we can introduce additional shared variables 
to distinguish between such states, thereby increasing the synchronization cost and allowing more be- 
haviours of M to be preserved in P\ or, we can resign to limited observability [24 J of global states, 
resulting in lower synchronization cost and fewer permitted behaviours of M. In particular, for the latter 
case, we implement a safe subset of the behaviours of M by inferring synchronization guards correspond- 
ing to the negation of variable valuations (states) that are not present in M. Since a global state u ^ M 
may be indistinguishable over some Vari from a state s £M, when eliminating behaviours rooted at u, we 
also eliminate all (good) behaviours of M, rooted at s. We refer the reader to Il24l for a detailed treatment 
of this trade-off. 



4.3 Synchronization using Locks and Condition Variables 

While CCRs provide an elegant high-level synchronization solution, many programming languages pre- 
fer and only provide lower-level synchronization primitives such as locks for mutual exclusion, and 
wait/signal over condition variables for condition synchronization. In what follows, we present an au- 



tomatic compilation of the CCRs inferred in Sec. 3.4 for P^, into both coarse-grained and fine-grained 
synchronization code based on these lower-level primitives. The resulting processes are denoted as 
P| (coarse-grained) and p( , Pj (fine-grained). 

In both cases, we declare locks and conditions variables for synchronization. For the program P'', 
which has a coarser level of lock granularity, we declare a single lock / for controlling access to shared 
variables and condition variables. For the program p( \\ P^ with a finer level of lock granularity, we 
declare separate locks Ix for controlling access to each shared data variable v G X and the shared 
auxiliary variable x, respectively. We further define a separate lock /cvi , , hvjj for each condition variable 
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'l • 






boolean Guardni) { 


lock(Z) { 


lock(/fvi.i) { 




lock(/v,,/v,,---, Ix) { 


while (!Gi,,) 


while {\Guard\i) 




if (Gi,) { 


wait (cvi.,-,Z) ; 


wait(cvi,;,Zcvi,) ; 




if {G'^^p 


if iGlJ) 


} 




x:=\; 


x:=l ; 


lock(/fv,^) { 




if {G\'J'') 


if (G^^O 


signal (cv2r) ; 




X := O'; 


x:=0; 


} 




inst{l\) ; 


inst{l[); 






return(true) ; 


signal (cv2,r) ; 


l0Ck(/cv2.i) { 

signal (cv2.v) ; 




} 

else return(f alse) ; 


signal (cv2.v) ; 


} 




}} 


} 






(b) Fine-grained 



(a) Coarse-grained 



Figure 1 : Coarse and fine-grained synclironization code corresponding to an example CCR at location l\ 
of Pi. Guards G""^, G''{^f' above corresponds to all states in M on which inst{l\) is enabled, and there's 
an assignment x:=\, x:=0, respectively, along a Pi transition out of the states. 



cv\^i, cv2.j to allow simultaneous processing of different condition variables. 

We refer the reader to Fig. la for an example of coarse-grained synchronization code corresponding 
to the CCR at location l\ of Pi. Note that, for ease of presentation, we have used conventional pseu- 
docode, instead of our programming language. Further note that we find it convenient to express locks, 
as lock(l){. . .} (in a manner similar to Java's synchronized keyword), wherein Z is a lock variable, 
'{' denotes lock acquisition and '}' denotes lock release. This simple implementation involves acquiring 
the lock / and checking if the overall guard Gi,, for executing inst{l[ ) is enabled. While the guard is F, 
waits for it to change to T. This is implemented by associating a condition variable cvij with the overall 
guard Gi.,: PJ" releases the lock / and waits till P| signals it that Gi.,- could be T; P[ then reacquires the 
lock and rechecks the guard. If the overall guard is T, P} enters the instruction block of the CCR and 
executes the instructions while holding the lock /. Finally, P[ sends a notification signal corresponding to 
every guard (i.e. condition variable) of PI which may be changed to T by P{^'s shared variables updates, 
and releases the lock. 

While fine-grained locking can typically be achieved by careful definition and nesting of multiple 
locks, one needs to be especially cautious in the presence of condition variables for various reasons. 
For instance, upon execution of wait (c,Z) in a nested locking scheme, a process only releases the lock / 
before going to sleep, while still holding all outer locks. This can potentially lead to a deadlock. The fine- 



grained synchronization code in Pj , shown in Fig. lb circumvents these issues by utilizing a separate 



subroutine to evaluate the overall guard Gi^,. In this subroutine, P( first acquires all necessary locks, 
corresponding to all shared variables accessed in the subroutine. These locks are acquired in a strictly 
nested fashion and in a predecided fixed order to prevent deadlocks. We use lock(Zi,/2, ...){.. .} to 
denote the nested locks lock(/i){ lock(/2){ •••}}, with Zi being the outermost lock variable. The 
subroutine then evaluates Gi ; and returns its value to the main body of P( . If found T, the subroutine 
also executes the instruction block of the CCR. The synchronization code in the main body of Pj acquires 
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the relevant lock Zcvi , and calls its guard-computing subroutine within a while loop till it returns T, after 
which it releases the lock. If the subroutine returns F, the process releases 1^.^ , and waits on the associated 
condition variable cvij . Each notification signal for a condition variable, on which the other process may 
be waiting, is sent out by acquiring the corresponding lock. 

We emphasize certain optimizations implemented in our compilations that potentially improve the 
performance of the synthesized concurrent program: (a) declaration of condition variables only when 
necessary, and (b) sending notification signals only when some guard in the other process may have 
changed. We refer the reader to lUl for more details of this compilation. 

4.4 Multiple (k > 2) Processes 

Our basic algorithmic framework can be extended in a straight-forward manner to the synthesis of syn- 
chronization for concurrent programs with an arbitrary (but fixed) number k of processes. But since this 
involves building a global model M, with size exponential in k, it exhibits a state explosion problem. 
There has, however, been work |l3l 13 on improving the scalability of the approach by avoiding building 
the entire global model, and instead composing interacting process pairs to generate synchronized pro- 
cesses. Hence, for ^ > 2 processes, we can adapt the more scalable synthesis algorithms to the synthesis 
of L,CTL formulas. 

The compilation of CCRs into coarse-grained and fine-grained synchronization code can be extended 
in a straight-forward manner to k > 2 processes. We emphasize that this compilation acts on individual 
processes directly, without construction or manipulation of the global model, and hence circumvents the 
state-explosion problem for arbitrary k. 

5 Discussion 

Related work: Early work on synthesis of synchronization for shared- memory concurrent programs from 
temporal specifications [7] utilized a tableau-based decision procedure for extracting synchronization 
skeletons from unsynchronized process skeletons. While the core technique has great potential, the orig- 
inal work had little practical impact due to its remoteness from realistic concurrent programs and pro- 
gramming languages. The limited modeling of shared-memory concurrency in this work did not include 
local and shared data variables, and hence, could not support semantic specifications over the values of 
program variables. There was no explicit treatment of process skeletons with branching, observability of 
program counters or local variables, and no attempt to synthesize synchronization based on lower-level 
synchronization primitives. 

More recently, practically viable synthesis of synchronization has been proposed for both finite- 
state \24\ and infinite-state concurrent programs |25|. However, in both [24], 125 1 , the authors only 
handle safety specifications; in fact, it can be shown that synthesis methods that rely on pruning a global 
product graph ( |[T3ll24ll25l ) cannot, in general, work for liveness. Moreover, these papers do not support 
any kind of external environment; in particular, these papers do not account for different (environment- 
enabled) initializations of the program variables. Finally, similar to [7J, these papers only synthesize 
high-level synchronization in the form in CCRs ll24l and atomic sections ||25]| . and do not attempt to 
synthesize synchronization based on lower-level synchronization primitives available in commonly used 
programming languages. 

On the other end of the spectrum, there has been some important work on automatic synthesis of 
lower-level synchronization, in the form of memory fences, for concurrent programs running on relaxed 
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memory models |[T6l[T5]| . There has also been work on mapping high-level synchronization into lower- 
level synchronization lH |26l - these papers do not treat liveness properties, are not fully algorithmic, 
and are verification-driven. Among papers that address refinement of locking granularity, are [4 |, which 
translates guarded commands, into synchronization based on atomic reads and atomic writes, and papers 
on compiler-based lock inference for atomic sections (0, Q etc.)- The lock-inference papers 13, 10 
rely on the availability of high-level synchronization in the form of atomic sections, and do not, in gen- 
eral, support condition synchronization. Sketching [22 1, a search-based program synthesis technique, is 
a verification-driven approach, which can be used to synthesize optimized implementations of synchro- 
nization primitives, e.g. barriers, from partial program sketches. 

A note on reactive systems: A shared-memory concurrent program can also be viewed as a reactive sys- 
tem. A reactive system ifTTlfTQl is described as one that maintains an ongoing interaction with an external 
environment or within its internal concurrent modules. Such systems cannot be adequately described by 
relational specifications over initial and final states - this distinguishes them from transformational or 
relational programs. An adequate description of a reactive system must refer to its ongoing desired be- 
haviour, throughout its (possibly non-terminating) activity - temporal logic JTSl has been recognized as 
convenient for this purpose. 

A reactive system may be terminating or not, sequential or concurrent, and implemented on a mono- 
lithic or distributed architecture. A reactive system can also be open or closed |[20l l2Tl . This has been 
a somewhat overlooked dichotomy in recent years. We have observed that it is not uncommon to view 
reactive systems exclusively as open systems; this is especially true in the context of synthesis. While the 
first algorithms on synthesis of concurrent programs QHtIIBI were proposed for closed reactive systems, 
the foundational work in EOl |2T]| set the stage for an extensive body of impressive results on synthesis 
of open reactive systems (see 1231 for a survey). 

We contend that the relatively simpler problem of synthesis of closed reactive systems is an important 
problem in its own right. This is especially true in the context of shared-memory concurrent programs, 
where it is sometimes sufficient and desirable to model programs as closed systems and force the compo- 
nent processes to cooperate with each other for achieving a common goal. If one must model an external 
environment, it is also often sufficient to model the environment in a restricted manner (as in this paper) 
or optimistically assume a helpful environment (see ifDl'). 

Concluding Remarks: In this paper, we have presented a general tableau-based framework for the synthe- 
sis of synchronization for shared memory concurrent programs. While we have identified and explored 
initial solutions for issues such as environment-initialized variables, limited observability of local vari- 
ables, pleasantness of guards, much work remains to be done. We also wish to extend the basic program 
model to handle nondeterministic programs, infinite-state programs as well as dynamic allocation of 
threads. Finally, we want to investigate techniques to reduce the overall complexity of the method. 

Acknowledgements: The author wishes to thank Jyotirmoy Deshmukh for many insightful discussions 
during the course of writing this paper, and an anonymous reviewer for pointing out interesting future 
research directions. 
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